40 Survey Data

Description: Survdat (Survey database)

Found in: State of the Ecosystem - Gulf of Maine & Georges Bank (2017, 2018, 2019, 2020), State of the Ecosystem - Mid-Atlantic (2017, 2018, 2019, 2020)

Indicator category: Database pull

Contributor(s): Sean Lucey

Data steward: Sean Lucey

Point of contact: Sean Lucey

Public availability statement: Source data are available to qualified researchers upon request (see “Access Information” here). Derived data used in SOE reports are available here.

40.1 Impact Summary

Description of why this indicator is important and what is going on in the Northeast related to this information.

40.1.1 Plotted data

Code used to create the figure below can be found here

Spring (left) and fall (right) surveyed biomass in the Mid-Atlantic Bight. Data from the NEFSC Bottom Trawl Survey are shown in black, with NEAMAP shown in red.

Figure 40.1: Spring (left) and fall (right) surveyed biomass in the Mid-Atlantic Bight. Data from the NEFSC Bottom Trawl Survey are shown in black, with NEAMAP shown in red.

40.1.2 Raw data

The raw data is available in the ecodata package.

40.2 Methods

The Northeast Fisheries Science Center (NEFSC) has been conducting standardized bottom trawl surveys in the fall since 1963 and spring since 1968. The surveys follow a stratified random design. Fish species and several invertebrate species are enumerated on a tow by tow basis (???).
The data are housed in the NEFSC’s survey database (SVDBS) maintained by the Ecosystem Survey Branch.

Direct pulls from the database are not advisable as there have been several gear modifications and vessel changes over the course of the time series (???). Survdat was developed as a database query that applies the appropriate calibration factors for a seamless time series since the 1960s. As such, it is the base for many of the other analyses conducted for the State of the Ecosystem report that involve fisheries independent data.

The Survdat script can be broken down into two sections. The first pulls the raw data from SVDBS. While the script is able to pull data from more than just the spring and fall bottom trawl surveys, for the purposes of the State of the Ecosystem reports only the spring and fall data are used. Survdat identifies those research cruises associated with the seasonal bottom trawl surveys and pulls the station and biological data. Station data includes tow identification (cruise, station, and stratum), tow location and date, as well as several environmental variables (depth, surface/bottom salinity, and surface/bottom temperature). Stations are filtered for representativness using a station, haul, gear (SHG) code for tows prior to 2009 and a tow, operations, gear, and aquisition (TOGA) code from 2009 onward. The codes that correspond to a representative tow (SHG <= 136 or TOGA <= 1324) are the same used by assessment biologists at the NEFSC. Biological data includes the total biomass and abundance by species, as well as lengths and number at length.

The second section of the Survdat script applies the calibration factors. There are four calibrartion factors applied (Table 40.1). Calibration factors are pulled directly from SVDBS. Vessel conversions were made from either the NOAA Ship Delaware II or NOAA Ship Henry Bigelow to the NOAA Ship Albatross IV which was the primary vessel for most of the time series. The Albatross was decommisioned in 2009 and the Bigelow is now the primary vessel for the bottom trawl survey.

Table 40.1: Calibration factors for NEFSC trawl survey data
Name Code Applied
Door Conversion DCF <1985
Net Conversion GCF 1973 - 1981 (Spring)
Vessel Conversion I VCF Delaware II records
Vessel Conversion II BCF Henry Bigelow records

The output from Survdat is an RData file that contains all the station and biological data, corrected as noted above, from the NEFSC Spring Bottom Trawl Survey and NEFSC Fall Bottom Trawl Survey. The RData file is a data.table, a powerful wrapper for the base data.frame (https://cran.r-project.org/web/packages/data.table/data.table.pdf). There are also a series of tools that have been developed in order to utilize the Survdat data set (https://github.com/slucey/RSurvey).

40.2.1 Data sources

Survdat is a database query of the NEFSC survey database (SVDBS).These data are available to qualified researchers upon request. More information on the data request process is available under the “Access Information” field here.

40.2.2 Data extraction

Extraction methods are described above. The R code found here was used in the survey data extraction process.

40.2.3 Data analysis

The fisheries independent data contained within the Survdat is used in a variety of products; the more complicated analyses are detailed in their own sections. The most straightforward use of this data is for the resource species aggregate biomass indicators. For the purposes of the aggregate biomass indicators, fall and spring survey data are treated separately. Additionally, all length data is dropped and species seperated by sex at the catch level are merged back together.

Since 2020, survey strata where characterized as being within an Ecological Production Unit based on where at least 50% of the area of the strata was located (Figure 40.2. While this does not create a perfect match for the EPU boundaries it allows us to calculate the variance associated with the index as the survey was designed.

Map of the Northeast Shelf broken into the four Ecological Production Units by strata.  Strata were assigned to an EPU based on which one contained at least 50% of the area of the strata.

Figure 40.2: Map of the Northeast Shelf broken into the four Ecological Production Units by strata. Strata were assigned to an EPU based on which one contained at least 50% of the area of the strata.

Prior to 2020, Survdat was first post stratified into EPUs by labeling stations by the EPU they fell within using the over function from the rgdal R package (???). Next, the total number of stations within each EPU per year is counted using unique station records. Biomass is summed by species per year per EPU. Those sums are divided by the appropriate station count to get the EPU mean. Finally, the mean biomasses are summed by aggregate groups. These steps are encompassed in the processing code, which also includes steps taken to format the data set for inclusion in the ecodata R package.